CHAPTER 17 More of a Good Thing: Multiple Regression 247
meaning models with the same outcome variable, but different groups of predic-
tors. You also use some sort of strategy in choosing the order in which you intro-
duce the predictors into the iterative models, which is described in Chapter 20. So
imagine that you used our example data set and — in one iteration — ran a model
to predict SBP with Age and other predictors in it, and the coefficient for Age was
statistically significant. Now, imagine you added Weight to that model, and in the
new model, Age was no longer statistically significant! You’ve just been visited by
the collinearity fairy.
In the example from Table 17-2, there’s a statistically significant positive correla-
tion between each predictor and the outcome. We figured this out when running
the correlations for Figure 17-1, but you could check our work by using the data in
Figure 17-2 in a straight-line regression, as described in Chapter 16. In contrast,
the multiple regression output in Figure 17-2 shows that neither Age nor Weight
are statistically significant in the model, meaning neither has regression coeffi-
cients that are statistically significantly different from zero! Why are they associ-
ated with the outcome in correlation but not multiple regression analysis?
The answer is collinearity. In the regression world, the term collinearity (also
called multicollinearity) refers to a strong correlation between two or more of the
predictor variables. If you run a correlation between Age and Weight (the two pre-
dictors), you’ll find that they’re statistically significantly correlated with each
other. It is this situation that destroys your statistically significant p value seen on
some predictors in iterative models when doing multiple regression.
The problem with collinearity is that you cannot tell which of the two predictor
variables is actually influencing the outcome more, because they are fighting over
explaining the variability in the dependent variable. Although models with col-
linearity are valid, they are hard to interpret if you are looking for cause-and-
effect relationships, meaning you are doing causal inference. Chapter 20 provides
philosophical guidance on dealing with collinearity in modeling.
Calculating How Many Participants
You Need
Studies should target enrolling a large enough sample size to ensure that you get
a statistically significant result for your primary research hypothesis in the case
that the effect you’re testing in that hypothesis is large enough to be of clinical
importance. So if the main hypothesis of your study is going to be tested by a
multiple regression analysis, you should theoretically do a calculation to deter-
mine the sample size you need to support that analysis.